---
id: langgraph
title: LangGraph
sidebar_label: LangGraph
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import { Timeline, TimelineItem } from "@site/src/components/Timeline";
import VideoDisplayer from "@site/src/components/VideoDisplayer";
import ColabButton from "@site/src/components/ColabButton";

# LangGraph

<ColabButton 
  notebookUrl="https://colab.research.google.com/github/confident-ai/deepeval/blob/main/examples/notebooks/langgraph.ipynb" 
  className="header-colab-button"
/>

[LangGraph](https://www.langchain.com/langgraph) is an open-source framework for developing applications powered by large language models, enabling chaining of LLMs with external data sources and expressive workflows to build advanced generative AI solutions.

:::tip
We recommend logging in to [Confident AI](https://app.confident-ai.com) to view your LangGraph evaluation traces.

```bash
deepeval login
```

:::

## End-to-End Evals

`deepeval` allows you to evaluate LangGraph applications end-to-end in **under a minute**.

<Timeline>

<TimelineItem title="Configure LangGraph">

Create a `CallbackHandler` with a list of [task completion metrics](/docs/metrics-task-completion) you wish to use, and pass it to your LangGraph application's `invoke` method.

```python title="main.py" showLineNumbers
from langgraph.prebuilt import create_react_agent

from deepeval.integrations.langchain import CallbackHandler
 
from deepeval.metrics import TaskCompletionMetric
 
def get_weather(city: str) -> str:
    """Returns the weather in a city"""
    return f"It's always sunny in {city}!"

agent = create_react_agent(
    model="openai:gpt-4o-mini",  
    tools=[get_weather],  
    prompt="You are a helpful assistant",
)

#result = agent.invoke(
#    input = {"messages": [{"role": "user", "content": "what is the weather in sf"}]}, 
#    config = {"callbacks": [CallbackHandler(metrics=[task_completion_metric])]}
#)

#print(result)
```

:::info
Only [Task Completion](/docs/metrics-task-completion) is supported for the LangGraph integration. To use other metrics, manually [set up tracing](/docs/evaluation-llm-tracing) instead.
:::

</TimelineItem>
<TimelineItem title="Run evaluations">

Create an `EvaluationDataset` and invoke your LangGraph application for each golden within the `evals_iterator()` loop to run end-to-end evaluations.

<Tabs groupId="langgraph">
<TabItem value="synchronous" label="Synchronous">

```python title="main.py" showLineNumbers
from deepeval.dataset import Golden, EvaluationDataset

goldens = [
    Golden(input="What is the weather in Bogotá, Colombia?"),
    Golden(input="What is the weather in Paris, France?"),
]
 
dataset = EvaluationDataset(goldens=goldens)
 
for golden in dataset.evals_iterator():
    agent.invoke(
        input={"messages": [{"role": "user", "content": golden.input}]},
        config={"callbacks": [CallbackHandler(metrics=[TaskCompletionMetric()])]}
    )
```

</TabItem>
  <TabItem value="asynchronous" label="Asynchronous">

```python title="main.py" showLineNumbers
import asyncio
from deepeval.dataset import Golden, EvaluationDataset

dataset = EvaluationDataset(goldens=[
    Golden(input="What is the weather in Bogotá, Colombia?"),
    Golden(input="What is the weather in Paris, France?"),
])
 
for golden in dataset.evals_iterator():
    task = asyncio.create_task(
        agent.ainvoke(
            input={"messages": [{"role": "user", "content": golden.input}]},
            config={"callbacks": [CallbackHandler(metrics=[TaskCompletionMetric()])]}
        )
    )
    dataset.evaluate(task)
```

</TabItem>
</Tabs>

✅ Done. The `evals_iterator` will automatically generate a test run with individual evaluation traces for each golden.

</TimelineItem>
<TimelineItem title="View on Confident AI (optional)">

<VideoDisplayer
  src="https://confident-bucket.s3.us-east-1.amazonaws.com/end-to-end%3Alanggraph.mp4"
/>

</TimelineItem>

</Timeline>

:::note
If you need to evaluate individual components of your LangGraph application, [set up tracing](/docs/evaluation-llm-tracing) instead.
:::

## Component-level Evals

Using `deepeval`, you can now evaluate individual components of your LangGraph application.

### LLM
Define `metrics` in the metadata of the all the `BaseLanguageModel`s in your LangGraph application.

```python title="main.py" showLineNumbers
from langchain_openai import ChatOpenAI
from deepeval.metrics import AnswerRelevancyMetric
...

llm = ChatOpenAI(
    model="gpt-4o-mini", 
    metadata={"metric": [AnswerRelevancyMetric()]}
).bind_tools([get_weather])
```

### Tool
To pass `metrics` to the tools, you can use the DeepEval's LangChain `tool` decorator.
```python title="main.py" showLineNumbers
# from langchain_core.tools import tool
from deepeval.integrations.langchain import tool
from deepeval.metrics import AnswerRelevancyMetric
...

@tool(metric=[AnswerRelevancyMetric()])
def get_weather(location: str) -> str:
    """Get the current weather in a location."""
    return f"It's always sunny in {location}!"
```

## Evals in Production

To run online evaluations in production, simply replace `metrics` in `CallbackHandler` with a [metric collection](https://www.confident-ai.com/docs/metrics/metric-collections) string from Confident AI, and push your LangChain agent to production.

:::info
This will automatically evaluate all incoming traces in production with the task completion metrics defined in your [metric collection](https://www.confident-ai.com/docs/metrics/metric-collections).
:::

```python filename="main.py" showLineNumbers
result = agent_executor.invoke(
    {"input": "What is 8 multiplied by 6?"},
    config={"callbacks": [CallbackHandler(metric_collection="<metric-collection-name-with-task-completion>")]}
)
```